智能论文笔记

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

The Birth of Bias: A case study on the evolution of gender bias in an English language model

Oskar van der Wal , Jaap Jumelet , Katrin Schulz , Willem Zuidema

分类：自然语言处理 | 人工智能

2022-07-21

现代语言模型中的检测和缓解有害偏见被广泛认为是至关重要的开放问题。在本文中，我们退后一步，研究语言模型首先是如何偏见的。我们使用在英语Wikipedia语料库中训练的LSTM架构，使用相对较小的语言模型。在培训期间的每一步中，在每个步骤中都会更改数据和模型参数，我们可以详细介绍性别表示形式的发展，数据集中的哪些模式驱动器以及模型的内部状态如何与偏差相关在下游任务（语义文本相似性）中。我们发现性别的表示是动态的，并在训练过程中确定了不同的阶段。此外，我们表明，性别信息在模型的输入嵌入中越来越多地表示，因此，对这些性别的态度可以有效地减少下游偏置。监测训练动力学，使我们能够检测出在输入嵌入中如何表示男性和男性性别的不对称性。这很重要，因为这可能会导致幼稚的缓解策略引入新的不良偏见。我们更普遍地讨论了发现与缓解策略的相关性，以及将我们的方法推广到更大语言模型，变压器体系结构，其他语言和其他不良偏见的前景。

translated by 谷歌翻译

Real-time Hyper-Dimensional Reconfiguration at the Edge using Hardware Accelerators

Indhumathi Kandaswamy , Saurabh Farkya , Zachary Daniels , Gooitzen van der Wal , Aswin Raghavan , Yuzheng Zhang , Jun Hu , Michael Lomnitz , Michael Isnardi , David Zhang

分类：计算机视觉

2022-06-10

在本文中，我们介绍了战术边缘（水合物）的高维可重构分析，使用低S型嵌入式硬件可以在利用非MAC的边缘进行实时重新配置（不含浮点多裂动作）（无浮点多裂动作）（深神经网络）（ DNN）结合了高度（HD）计算加速器。我们描述了算法，经过训练的量化模型生成以及功能提取器的模拟性能，不含多重蓄能的供您喂养基于高维逻辑的分类器。然后，我们展示了性能如何随着超数的数量而增加。我们将与传统DNN相比，描述已实现的低压FPGA硬件和嵌入式软件系统，并详细介绍实现的硬件加速器。我们讨论了测量的系统延迟和功率，由于使用可学习的量化和高清计算而引起的噪声稳健性，用于视频活动分类任务的实际和模拟系统性能以及在同一数据集上进行重新配置的演示。我们表明，仅使用梯度下降反向传播（无梯度）的馈电HD分类器（无梯度），可以通过使用几乎没有射击的新课程来实现现场的可重构性。最初的工作使用了LRCN DNN，目前已扩展到使用具有改进性能的两流DNN。

translated by 谷歌翻译

Meta-learning generalizable dynamics from trajectories

Qiaofeng Li , Tianyi Wang , Vwani Roychowdhury , M. Khalid Jawed

分类：机器学习

2023-01-03

We present the interpretable meta neural ordinary differential equation (iMODE) method to rapidly learn generalizable (i.e., not parameter-specific) dynamics from trajectories of multiple dynamical systems that vary in their physical parameters. The iMODE method learns meta-knowledge, the functional variations of the force field of dynamical system instances without knowing the physical parameters, by adopting a bi-level optimization framework: an outer level capturing the common force field form among studied dynamical system instances and an inner level adapting to individual system instances. A priori physical knowledge can be conveniently embedded in the neural network architecture as inductive bias, such as conservative force field and Euclidean symmetry. With the learned meta-knowledge, iMODE can model an unseen system within seconds, and inversely reveal knowledge on the physical parameters of a system, or as a Neural Gauge to "measure" the physical parameters of an unseen system with observed trajectories. We test the validity of the iMODE method on bistable, double pendulum, Van der Pol, Slinky, and reaction-diffusion systems.

translated by 谷歌翻译

Online Real-time Learning of Dynamical Systems from Noisy Streaming Data

S. Sinha , Sai P. Nandanoori , David Barajas-Solano

分类：机器学习

2022-12-10

Recent advancements in sensing and communication facilitate obtaining high-frequency real-time data from various physical systems like power networks, climate systems, biological networks, etc. However, since the data are recorded by physical sensors, it is natural that the obtained data is corrupted by measurement noise. In this paper, we present a novel algorithm for online real-time learning of dynamical systems from noisy time-series data, which employs the Robust Koopman operator framework to mitigate the effect of measurement noise. The proposed algorithm has three main advantages: a) it allows for online real-time monitoring of a dynamical system; b) it obtains a linear representation of the underlying dynamical system, thus enabling the user to use linear systems theory for analysis and control of the system; c) it is computationally fast and less intensive than the popular Extended Dynamic Mode Decomposition (EDMD) algorithm. We illustrate the efficiency of the proposed algorithm by applying it to identify the Van der Pol oscillator, the IEEE 68 bus system, and a ring network of Van der Pol oscillators.

translated by 谷歌翻译

PRISM: Probabilistic Real-Time Inference in Spatial World Models

Atanas Mirchev , Baris Kayalibay , Ahmed Agha , Patrick van der Smagt , Daniel Cremers , Justin Bayer

分类：机器学习 | 计算机视觉 | 机器人 | (统计)机器学习

2022-12-06

We introduce PRISM, a method for real-time filtering in a probabilistic generative model of agent motion and visual perception. Previous approaches either lack uncertainty estimates for the map and agent state, do not run in real-time, do not have a dense scene representation or do not model agent dynamics. Our solution reconciles all of these aspects. We start from a predefined state-space model which combines differentiable rendering and 6-DoF dynamics. Probabilistic inference in this model amounts to simultaneous localisation and mapping (SLAM) and is intractable. We use a series of approximations to Bayesian inference to arrive at probabilistic map and state estimates. We take advantage of well-established methods and closed-form updates, preserving accuracy and enabling real-time capability. The proposed solution runs at 10Hz real-time and is similarly accurate to state-of-the-art SLAM in small to medium-sized indoor environments, with high-speed UAV and handheld camera agents (Blackbird, EuRoC and TUM-RGBD).

translated by 谷歌翻译

Adaptive Sequential Surveillance with Network and Temporal Dependence

Ivana Malenica , Jeremy R. Coyle , Mark J. van der Laan , Maya L. Petersen

分类： (统计)机器学习

2022-12-05

Strategic test allocation plays a major role in the control of both emerging and existing pandemics (e.g., COVID-19, HIV). Widespread testing supports effective epidemic control by (1) reducing transmission via identifying cases, and (2) tracking outbreak dynamics to inform targeted interventions. However, infectious disease surveillance presents unique statistical challenges. For instance, the true outcome of interest - one's positive infectious status, is often a latent variable. In addition, presence of both network and temporal dependence reduces the data to a single observation. As testing entire populations regularly is neither efficient nor feasible, standard approaches to testing recommend simple rule-based testing strategies (e.g., symptom based, contact tracing), without taking into account individual risk. In this work, we study an adaptive sequential design involving n individuals over a period of {\tau} time-steps, which allows for unspecified dependence among individuals and across time. Our causal target parameter is the mean latent outcome we would have obtained after one time-step, if, starting at time t given the observed past, we had carried out a stochastic intervention that maximizes the outcome under a resource constraint. We propose an Online Super Learner for adaptive sequential surveillance that learns the optimal choice of tests strategies over time while adapting to the current state of the outbreak. Relying on a series of working models, the proposed method learns across samples, through time, or both: based on the underlying (unknown) structure in the data. We present an identification result for the latent outcome in terms of the observed data, and demonstrate the superior performance of the proposed strategy in a simulation modeling a residential university environment during the COVID-19 pandemic.

translated by 谷歌翻译

Navigating causal deep learning

Jeroen Berrevoets , Krzysztof Kacprzyk , Zhaozhi Qian , Mihaela van der Schaar

分类：机器学习

2022-12-01

Causal deep learning (CDL) is a new and important research area in the larger field of machine learning. With CDL, researchers aim to structure and encode causal knowledge in the extremely flexible representation space of deep learning models. Doing so will lead to more informed, robust, and general predictions and inference -- which is important! However, CDL is still in its infancy. For example, it is not clear how we ought to compare different methods as they are so different in their output, the way they encode causal knowledge, or even how they represent this knowledge. This is a living paper that categorises methods in causal deep learning beyond Pearl's ladder of causation. We refine the rungs in Pearl's ladder, while also adding a separate dimension that categorises the parametric assumptions of both input and representation, arriving at the map of causal deep learning. Our map covers machine learning disciplines such as supervised learning, reinforcement learning, generative modelling and beyond. Our paradigm is a tool which helps researchers to: find benchmarks, compare methods, and most importantly: identify research gaps. With this work we aim to structure the avalanche of papers being published on causal deep learning. While papers on the topic are being published daily, our map remains fixed. We open-source our map for others to use as they see fit: perhaps to offer guidance in a related works section, or to better highlight the contribution of their paper.

translated by 谷歌翻译

Finding Front-Door Adjustment Sets in Linear Time

Marcel Wienöbst , Benito van der Zander , Maciej Liśkiewicz

分类：人工智能 | 机器学习

2022-11-29

Front-door adjustment is a classic technique to estimate causal effects from a specified directed acyclic graph (DAG) and observed data. The advantage of this approach is that it uses observed mediators to identify causal effects, which is possible even in the presence of unobserved confounding. While the statistical properties of the front-door estimation are quite well understood, its algorithmic aspects remained unexplored for a long time. Recently, Jeong, Tian, and Barenboim [NeurIPS 2022] have presented the first polynomial-time algorithm for finding sets satisfying the front-door criterion in a given DAG, with an $O(n^3(n+m))$ run time, where $n$ denotes the number of variables and $m$ the number of edges of the graph. In our work, we give the first linear-time, i.e. $O(n+m)$, algorithm for this task, which thus reaches the asymptotically optimal time complexity, as the size of the input is $\Omega(n+m)$. We also provide an algorithm to enumerate all front-door adjustment sets in a given DAG with delay $O(n(n + m))$. These results improve the algorithms by Jeong et al. [2022] for the two tasks by a factor of $n^3$, respectively.

translated by 谷歌翻译

CLAS: Coordinating Multi-Robot Manipulation with Central Latent Action Spaces

Elie Aljalbout , Maximilian Karl , Patrick van der Smagt

分类：机器人 | 机器学习 | 神经与进化计算

2022-11-28

Multi-robot manipulation tasks involve various control entities that can be separated into dynamically independent parts. A typical example of such real-world tasks is dual-arm manipulation. Learning to naively solve such tasks with reinforcement learning is often unfeasible due to the sample complexity and exploration requirements growing with the dimensionality of the action and state spaces. Instead, we would like to handle such environments as multi-agent systems and have several agents control parts of the whole. However, decentralizing the generation of actions requires coordination across agents through a channel limited to information central to the task. This paper proposes an approach to coordinating multi-robot manipulation through learned latent action spaces that are shared across different agents. We validate our method in simulated multi-robot manipulation tasks and demonstrate improvement over previous baselines in terms of sample efficiency and learning performance.

translated by 谷歌翻译